Mandarin connected digits recognition for whispered speech
نویسندگان
چکیده
In this paper, the acoustic characteristics and recognition of whispered speech are discussed. A Mandarin digits database is built both in normal speech and whispered speech. The collected speech materials of normal and whispered speech are analyzed to verify the characteristics and differences for the two kinds of speech. Cross recognition is carried out using normal and whispered speech as training data and testing data respectively, and the detailed recognition results are analyzed by using the confusion matrices. The results show that it’s not suitable to recognize whispered speech using models trained by normal speech, and the word correct rate of the whispered speech is in close relation with its acoustic characteristics. Some possible solutions are also suggested.
منابع مشابه
A whispered Mandarin corpus for speech technology applications
Whispered speech is a natural mode of speech in which voicing is absent – its acoustics differ significantly from normally spoken speech or so-called neutral speech, such that it is challenging to use only neutral speech to build speech processing and automatic recognition systems that can deal effectively with whisper. At the same time, humans can naturally produce and perceive whispered speec...
متن کاملAn Efficient Method for Removing Deletion Errors in Quickly-spoken Connected Mandarin Digit String Speech Recognition
Connected Mandarin digit string speech, especially at rapid spoken rate, is very difficult to recognize correctly. In this paper, a new training method named neighboring digits pattern is proposed in order to eliminate most of deletion errors which frequently occur in Mandarin digits speech recognition at high speaking rate when we have enough quickly-spoken speech data as the training set. The...
متن کاملEffects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context....
متن کاملTransfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition
Previous work on whispered speech recognition has shown that acoustic models (AM) trained on whispered speech can somewhat classify unwhispered (neutral) speech sounds, but not vice versa. In fact, AMs trained purely on neutral speech completely fail to recognize whispered speech. Meanwhile, recipes used to train neutral AMs will work just as well for whispered speech, but such methods require ...
متن کاملPerception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin
Whispering is commonly used when one needs to speak softly (for instance, in a library). Whispered speech mainly differs from neutral speech in that voicing, and thus its acoustic correlate F0, is absent. It is well known that in tonal languages such as Mandarin, tone identity is primarily conveyed by the F0 contour. Previous works also suggest that secondary correlates are both consistent and ...
متن کامل